24 research outputs found

    Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem

    Full text link
    In this paper we present a framework to analyze the asymptotic behavior of two timescale stochastic approximation algorithms including those with set-valued mean fields. This paper builds on the works of Borkar and Perkins & Leslie. The framework presented herein is more general as compared to the synchronous two timescale framework of Perkins \& Leslie, however the assumptions involved are easily verifiable. As an application, we use this framework to analyze the two timescale stochastic approximation algorithm corresponding to the Lagrangian dual problem in optimization theory

    Rainbow Connection Number and Radius

    Full text link
    The rainbow connection number, rc(G), of a connected graph G is the minimum number of colours needed to colour its edges, so that every pair of its vertices is connected by at least one path in which no two edges are coloured the same. In this note we show that for every bridgeless graph G with radius r, rc(G) <= r(r + 2). We demonstrate that this bound is the best possible for rc(G) as a function of r, not just for bridgeless graphs, but also for graphs of any stronger connectivity. It may be noted that for a general 1-connected graph G, rc(G) can be arbitrarily larger than its radius (Star graph for instance). We further show that for every bridgeless graph G with radius r and chordality (size of a largest induced cycle) k, rc(G) <= rk. It is known that computing rc(G) is NP-Hard [Chakraborty et al., 2009]. Here, we present a (r+3)-factor approximation algorithm which runs in O(nm) time and a (d+3)-factor approximation algorithm which runs in O(dm) time to rainbow colour any connected graph G on n vertices, with m edges, diameter d and radius r.Comment: Revised preprint with an extra section on an approximation algorithm. arXiv admin note: text overlap with arXiv:1101.574

    3DPG: Distributed Deep Deterministic Policy Gradient Algorithms for Networked Multi-Agent Systems

    Full text link
    We present Distributed Deep Deterministic Policy Gradient (3DPG), a multi-agent actor-critic (MAAC) algorithm for Markov games. Unlike previous MAAC algorithms, 3DPG is fully distributed during both training and deployment. 3DPG agents calculate local policy gradients based on the most recently available local data (states, actions) and local policies of other agents. During training, this information is exchanged using a potentially lossy and delaying communication network. The network therefore induces Age of Information (AoI) for data and policies. We prove the asymptotic convergence of 3DPG even in the presence of potentially unbounded Age of Information (AoI). This provides an important step towards practical online and distributed multi-agent learning since 3DPG does not assume information to be available deterministically. We analyze 3DPG in the presence of policy and data transfer under mild practical assumptions. Our analysis shows that 3DPG agents converge to a local Nash equilibrium of Markov games in terms of utility functions expressed as the expected value of the agents local approximate action-value functions (Q-functions). The expectations of the local Q-functions are with respect to limiting distributions over the global state-action space shaped by the agents' accumulated local experiences. Our results also shed light on the policies obtained by general MAAC algorithms. We show through a heuristic argument and numerical experiments that 3DPG improves convergence over previous MAAC algorithms that use old actions instead of old policies during training. Further, we show that 3DPG is robust to AoI; it learns competitive policies even with large AoI and low data availability

    Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber-Physical Systems

    Full text link
    In many Cyber-Physical Systems, we encounter the problem of remote state estimation of geographically distributed and remote physical processes. This paper studies the scheduling of sensor transmissions to estimate the states of multiple remote, dynamic processes. Information from the different sensors have to be transmitted to a central gateway over a wireless network for monitoring purposes, where typically fewer wireless channels are available than there are processes to be monitored. For effective estimation at the gateway, the sensors need to be scheduled appropriately, i.e., at each time instant one needs to decide which sensors have network access and which ones do not. To address this scheduling problem, we formulate an associated Markov decision process (MDP). This MDP is then solved using a Deep Q-Network, a recent deep reinforcement learning algorithm that is at once scalable and model-free. We compare our scheduling algorithm to popular scheduling algorithms such as round-robin and reduced-waiting-time, among others. Our algorithm is shown to significantly outperform these algorithms for many example scenarios
    corecore